Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory781.4 KiB
Average record size in memory80.0 B

Variable types

Numeric9
Categorical1

Alerts

Pregnancies is highly correlated with DiabeticHigh correlation
BMI is highly correlated with DiabeticHigh correlation
Age is highly correlated with DiabeticHigh correlation
Diabetic is highly correlated with Pregnancies and 2 other fieldsHigh correlation
BMI has unique values Unique
DiabetesPedigree has unique values Unique
Pregnancies has 2879 (28.8%) zeros Zeros

Reproduction

Analysis started2022-01-21 18:58:31.140480
Analysis finished2022-01-21 18:58:41.351529
Duration10.21 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

PatientID
Real number (ℝ≥0)

Distinct9959
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1502122.083
Minimum1000038
Maximum1999997
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:41.411614image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1000038
5-th percentile1052241.75
Q11251672.25
median1504394
Q31754607.5
95-th percentile1951606.9
Maximum1999997
Range999959
Interquartile range (IQR)502935.25

Descriptive statistics

Standard deviation289286.7648
Coefficient of variation (CV)0.1925853885
Kurtosis-1.199849714
Mean1502122.083
Median Absolute Deviation (MAD)251150.5
Skewness-0.00477009371
Sum1.502122083 × 1010
Variance8.368683229 × 1010
MonotonicityNot monotonic
2022-01-21T18:58:41.512648image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16230432
 
< 0.1%
18198762
 
< 0.1%
12246022
 
< 0.1%
17720382
 
< 0.1%
18301912
 
< 0.1%
14557602
 
< 0.1%
11094552
 
< 0.1%
15419302
 
< 0.1%
14070532
 
< 0.1%
11846512
 
< 0.1%
Other values (9949)9980
99.8%
ValueCountFrequency (%)
10000381
< 0.1%
10001831
< 0.1%
10003261
< 0.1%
10003401
< 0.1%
10004711
< 0.1%
10005101
< 0.1%
10006521
< 0.1%
10008691
< 0.1%
10009631
< 0.1%
10012291
< 0.1%
ValueCountFrequency (%)
19999971
< 0.1%
19998641
< 0.1%
19998361
< 0.1%
19993191
< 0.1%
19992501
< 0.1%
19992141
< 0.1%
19992011
< 0.1%
19991831
< 0.1%
19989891
< 0.1%
19989621
< 0.1%

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2558
Minimum0
Maximum14
Zeros2879
Zeros (%)28.8%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:41.598116image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q36
95-th percentile9
Maximum14
Range14
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.405719638
Coefficient of variation (CV)1.046046943
Kurtosis-0.5290873777
Mean3.2558
Median Absolute Deviation (MAD)2
Skewness0.8107787892
Sum32558
Variance11.59892625
MonotonicityNot monotonic
2022-01-21T18:58:41.667709image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
02879
28.8%
11932
19.3%
3784
 
7.8%
6720
 
7.2%
2623
 
6.2%
7606
 
6.1%
9602
 
6.0%
5456
 
4.6%
4454
 
4.5%
8446
 
4.5%
Other values (5)498
 
5.0%
ValueCountFrequency (%)
02879
28.8%
11932
19.3%
2623
 
6.2%
3784
 
7.8%
4454
 
4.5%
5456
 
4.6%
6720
 
7.2%
7606
 
6.1%
8446
 
4.5%
9602
 
6.0%
ValueCountFrequency (%)
1421
 
0.2%
1349
 
0.5%
1243
 
0.4%
1187
 
0.9%
10298
3.0%
9602
6.0%
8446
4.5%
7606
6.1%
6720
7.2%
5456
4.6%

PlasmaGlucose
Real number (ℝ≥0)

Distinct149
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.8502
Minimum44
Maximum192
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:41.758722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile57
Q184
median105
Q3129
95-th percentile168
Maximum192
Range148
Interquartile range (IQR)45

Descriptive statistics

Standard deviation31.92090936
Coefficient of variation (CV)0.2959745032
Kurtosis-0.5353944505
Mean107.8502
Median Absolute Deviation (MAD)22
Skewness0.3260878604
Sum1078502
Variance1018.944454
MonotonicityNot monotonic
2022-01-21T18:58:41.858365image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
97171
 
1.7%
96164
 
1.6%
118146
 
1.5%
107141
 
1.4%
119134
 
1.3%
101131
 
1.3%
85131
 
1.3%
95131
 
1.3%
89130
 
1.3%
116127
 
1.3%
Other values (139)8594
85.9%
ValueCountFrequency (%)
4414
 
0.1%
4529
0.3%
4617
 
0.2%
4723
 
0.2%
4832
0.3%
4914
 
0.1%
5025
0.2%
5137
0.4%
5260
0.6%
5358
0.6%
ValueCountFrequency (%)
1924
 
< 0.1%
1915
0.1%
1902
 
< 0.1%
1894
 
< 0.1%
1889
0.1%
1878
0.1%
1865
0.1%
1858
0.1%
1848
0.1%
18311
0.1%

DiastolicBloodPressure
Real number (ℝ≥0)

Distinct90
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.2075
Minimum24
Maximum117
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:41.956461image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile45
Q158
median72
Q385
95-th percentile96
Maximum117
Range93
Interquartile range (IQR)27

Descriptive statistics

Standard deviation16.80147829
Coefficient of variation (CV)0.2359509643
Kurtosis-0.8243769924
Mean71.2075
Median Absolute Deviation (MAD)13
Skewness-0.1056225452
Sum712075
Variance282.2896727
MonotonicityNot monotonic
2022-01-21T18:58:42.058411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
81290
 
2.9%
79273
 
2.7%
78271
 
2.7%
83271
 
2.7%
86264
 
2.6%
84259
 
2.6%
87252
 
2.5%
60247
 
2.5%
85247
 
2.5%
80246
 
2.5%
Other values (80)7380
73.8%
ValueCountFrequency (%)
2416
0.2%
2510
0.1%
268
0.1%
2710
0.1%
288
0.1%
293
 
< 0.1%
309
0.1%
315
 
0.1%
325
 
0.1%
333
 
< 0.1%
ValueCountFrequency (%)
1172
 
< 0.1%
1167
0.1%
1157
0.1%
1148
0.1%
1139
0.1%
1121
 
< 0.1%
1119
0.1%
1102
 
< 0.1%
10910
0.1%
10812
0.1%

TricepsThickness
Real number (ℝ≥0)

Distinct66
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.8176
Minimum7
Maximum92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:42.255551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile8
Q115
median31
Q341
95-th percentile52
Maximum92
Range85
Interquartile range (IQR)26

Descriptive statistics

Standard deviation14.50648042
Coefficient of variation (CV)0.5033896097
Kurtosis-0.7037530765
Mean28.8176
Median Absolute Deviation (MAD)12
Skewness0.1636121117
Sum288176
Variance210.437974
MonotonicityNot monotonic
2022-01-21T18:58:42.346403image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11433
 
4.3%
10385
 
3.9%
9383
 
3.8%
34359
 
3.6%
7352
 
3.5%
8336
 
3.4%
45336
 
3.4%
35325
 
3.2%
42323
 
3.2%
44321
 
3.2%
Other values (56)6447
64.5%
ValueCountFrequency (%)
7352
3.5%
8336
3.4%
9383
3.8%
10385
3.9%
11433
4.3%
12254
2.5%
13136
 
1.4%
14150
 
1.5%
15243
2.4%
16127
 
1.3%
ValueCountFrequency (%)
922
< 0.1%
912
< 0.1%
902
< 0.1%
893
< 0.1%
883
< 0.1%
864
< 0.1%
752
< 0.1%
744
< 0.1%
733
< 0.1%
724
< 0.1%

SerumInsulin
Real number (ℝ≥0)

Distinct620
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139.2436
Minimum14
Maximum796
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:42.444521image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile18
Q139
median85
Q3197
95-th percentile409
Maximum796
Range782
Interquartile range (IQR)158

Descriptive statistics

Standard deviation133.7779194
Coefficient of variation (CV)0.9607473476
Kurtosis3.567230944
Mean139.2436
Median Absolute Deviation (MAD)62
Skewness1.741118017
Sum1392436
Variance17896.53171
MonotonicityNot monotonic
2022-01-21T18:58:42.545492image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28115
 
1.1%
23114
 
1.1%
16114
 
1.1%
47111
 
1.1%
27111
 
1.1%
32108
 
1.1%
44107
 
1.1%
43105
 
1.1%
14105
 
1.1%
46104
 
1.0%
Other values (610)8906
89.1%
ValueCountFrequency (%)
14105
1.1%
1593
0.9%
16114
1.1%
1799
1.0%
1890
0.9%
19101
1.0%
2086
0.9%
21101
1.0%
2291
0.9%
23114
1.1%
ValueCountFrequency (%)
7961
< 0.1%
7951
< 0.1%
7931
< 0.1%
7871
< 0.1%
7861
< 0.1%
7831
< 0.1%
7731
< 0.1%
7671
< 0.1%
7621
< 0.1%
7581
< 0.1%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.56702174
Minimum18.20080735
Maximum56.03462763
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:42.650534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum18.20080735
5-th percentile18.93106087
Q121.24742683
median31.92242078
Q339.32892145
95-th percentile47.10058666
Maximum56.03462763
Range37.83382028
Interquartile range (IQR)18.08149461

Descriptive statistics

Standard deviation9.804365694
Coefficient of variation (CV)0.3105888726
Kurtosis-1.208882423
Mean31.56702174
Median Absolute Deviation (MAD)9.944330905
Skewness0.188418633
Sum315670.2174
Variance96.12558665
MonotonicityNot monotonic
2022-01-21T18:58:42.751738image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
43.509725931
 
< 0.1%
36.000978631
 
< 0.1%
25.546225031
 
< 0.1%
33.921696481
 
< 0.1%
48.15913651
 
< 0.1%
35.718095091
 
< 0.1%
21.815206111
 
< 0.1%
20.627079271
 
< 0.1%
36.120716741
 
< 0.1%
33.819643351
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
18.200807351
< 0.1%
18.201193021
< 0.1%
18.203229241
< 0.1%
18.207537721
< 0.1%
18.209768671
< 0.1%
18.210319091
< 0.1%
18.210323021
< 0.1%
18.211450721
< 0.1%
18.2139451
< 0.1%
18.214286251
< 0.1%
ValueCountFrequency (%)
56.034627631
< 0.1%
55.947183081
< 0.1%
55.86623821
< 0.1%
55.858812761
< 0.1%
55.70642821
< 0.1%
55.620392671
< 0.1%
55.612997741
< 0.1%
55.587416691
< 0.1%
55.579601031
< 0.1%
55.538996041
< 0.1%

DiabetesPedigree
Real number (ℝ≥0)

UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4009437247
Minimum0.078043795
Maximum2.301594189
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:42.857424image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.078043795
5-th percentile0.09052474235
Q10.1370654103
median0.199698294
Q30.6211583733
95-th percentile1.146758208
Maximum2.301594189
Range2.223550394
Interquartile range (IQR)0.484092963

Descriptive statistics

Standard deviation0.38146316
Coefficient of variation (CV)0.9514132195
Kurtosis2.924154025
Mean0.4009437247
Median Absolute Deviation (MAD)0.091999566
Skewness1.676804816
Sum4009.437247
Variance0.1455141424
MonotonicityNot monotonic
2022-01-21T18:58:42.956532image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.2131913541
 
< 0.1%
0.9441624221
 
< 0.1%
1.0643071991
 
< 0.1%
0.1699396391
 
< 0.1%
0.4727977471
 
< 0.1%
0.7366633971
 
< 0.1%
0.2681286121
 
< 0.1%
0.0910084641
 
< 0.1%
0.7087239391
 
< 0.1%
0.1877845171
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
0.0780437951
< 0.1%
0.0780826661
< 0.1%
0.0780926481
< 0.1%
0.0781070821
< 0.1%
0.0781695651
< 0.1%
0.0781768171
< 0.1%
0.0781813441
< 0.1%
0.0782361571
< 0.1%
0.0782423711
< 0.1%
0.0782505581
< 0.1%
ValueCountFrequency (%)
2.3015941891
< 0.1%
2.2912942421
< 0.1%
2.2873881681
< 0.1%
2.2851801841
< 0.1%
2.2704153831
< 0.1%
2.2675504161
< 0.1%
2.2466091621
< 0.1%
2.2452876971
< 0.1%
2.2158152351
< 0.1%
2.2049186011
< 0.1%

Age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.1341
Minimum21
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-01-21T18:58:43.057498image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q122
median24
Q335
95-th percentile57
Maximum77
Range56
Interquartile range (IQR)13

Descriptive statistics

Standard deviation12.10604695
Coefficient of variation (CV)0.4017391245
Kurtosis1.222442785
Mean30.1341
Median Absolute Deviation (MAD)2
Skewness1.484823135
Sum301341
Variance146.5563728
MonotonicityNot monotonic
2022-01-21T18:58:43.152375image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
221692
16.9%
211670
16.7%
231332
13.3%
25682
 
6.8%
24652
 
6.5%
26647
 
6.5%
45203
 
2.0%
46191
 
1.9%
44187
 
1.9%
43185
 
1.8%
Other values (46)2559
25.6%
ValueCountFrequency (%)
211670
16.7%
221692
16.9%
231332
13.3%
24652
 
6.5%
25682
6.8%
26647
 
6.5%
2823
 
0.2%
2937
 
0.4%
30136
 
1.4%
31125
 
1.2%
ValueCountFrequency (%)
773
 
< 0.1%
762
 
< 0.1%
755
 
0.1%
743
 
< 0.1%
734
 
< 0.1%
723
 
< 0.1%
7132
0.3%
7023
0.2%
6923
0.2%
6814
0.1%

Diabetic
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
6656 
1
3344 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters10000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Length

2022-01-21T18:58:43.245466image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-01-21T18:58:43.291515image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Most occurring characters

ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Most occurring scripts

ValueCountFrequency (%)
Common10000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII10000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06656
66.6%
13344
33.4%

Interactions

2022-01-21T18:58:40.282359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.490447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.320434image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.146755image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.086484image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.891142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.651074image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.585216image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.395362image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.365434image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.599738image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.416380image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.241400image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.174556image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.973145image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.745678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.676299image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.482442image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.457383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.693360image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.512467image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.338630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.265651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.058157image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.942139image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.771386image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.571317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.547458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.786445image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.612667image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.538417image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.361848image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.143142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.038151image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.869483image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.663402image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.633728image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.873531image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.706356image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.632512image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.448465image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.227152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.126537image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.961228image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.749485image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.715336image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:33.954547image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.789432image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.724397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.531467image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.305051image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.210216image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.043304image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.830338image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.808421image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.048648image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.882521image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.821491image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.626390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.396587image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.305302image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.136388image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.930430image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.897481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.140154image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.974629image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.913583image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.719480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.484450image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.402390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.225209image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.019240image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.981595image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:34.229350image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:35.059160image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.000145image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:36.805571image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:37.567203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:38.493778image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:39.309285image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-01-21T18:58:40.100314image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-01-21T18:58:43.346135image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-01-21T18:58:43.482328image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-01-21T18:58:43.617458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-01-21T18:58:43.752499image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-01-21T18:58:41.129302image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-01-21T18:58:41.278437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

PatientIDPregnanciesPlasmaGlucoseDiastolicBloodPressureTricepsThicknessSerumInsulinBMIDiabetesPedigreeAgeDiabetic
01354778017180342343.5097261.213191210
1114743889293473621.2405760.158365230
21640031711547523541.5115230.079019230
318833509103782530429.5821921.282870431
4142411918559273542.6045360.549542220
5161929708292925319.7241600.103424260
616601490133471922721.9413570.174160210
7145876906787433618.2777230.236165260
8120164788095332426.6249290.443947531
9140391217231404236.8895760.103944260

Last rows

PatientIDPregnanciesPlasmaGlucoseDiastolicBloodPressureTricepsThicknessSerumInsulinBMIDiabetesPedigreeAgeDiabetic
9990131755008680103643.3898980.083597210
999118190564140942517032.4488780.108273451
99921639966410083344926.2731090.136661421
9993100661216985174636.5645690.139280230
9994146456408439353741.4433760.123610260
99951469198695853726718.4975420.660240310
999614327360555175021.8653410.086589340
9997141096259959476730.7740182.301594431
99981958653014567302118.8118610.789572260
999913329381010054342738.8409430.175465230